movie script
Three Stage Narrative Analysis; Plot-Sentiment Breakdown, Structure Learning and Concept Detection
Khan, Taimur, Ahsan, Ramoza, Hameed, Mohib
Story understanding and analysis have long been challenging areas within Natural Language Understanding. Automated narrative analysis requires deep computational semantic representations along with syntactic processing. Moreover, the large volume of narrative data demands automated semantic analysis and computational learning rather than manual analytical approaches. In this paper, we propose a framework that analyzes the sentiment arcs of movie scripts and performs extended analysis related to the context of the characters involved. The framework enables the extraction of high-level and low-level concepts conveyed through the narrative. Using dictionary-based sentiment analysis, our approach applies a custom lexicon built with the LabMTsimple storylab module. The custom lexicon is based on the Valence, Arousal, and Dominance scores from the NRC-VAD dataset. Furthermore, the framework advances the analysis by clustering similar sentiment plots using Wards hierarchical clustering technique. Experimental evaluation on a movie dataset shows that the resulting analysis is helpful to consumers and readers when selecting a narrative or story.
- North America > United States > Massachusetts (0.04)
- Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.04)
- North America > United States > Vermont (0.04)
- (3 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Information Technology (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.69)
- (2 more...)
CHATTER: A Character Attribution Dataset for Narrative Understanding
Baruah, Sabyasachee, Narayanan, Shrikanth
Computational narrative understanding studies the identification, description, and interaction of the elements of a narrative: characters, attributes, events, and relations. Narrative research has given considerable attention to defining and classifying character types. However, these character-type taxonomies do not generalize well because they are small, too simple, or specific to a domain. We require robust and reliable benchmarks to test whether narrative models truly understand the nuances of the character's development in the story. Our work addresses this by curating the Chatter dataset that labels whether a character portrays some attribute for 88148 character-attribute pairs, encompassing 2998 characters, 13324 attributes and 660 movies. We validate a subset of Chatter, called ChatterEval, using human annotations to serve as an evaluation benchmark for the character attribution task in movie scripts. ChatterEval assesses narrative understanding and the long-context modeling capacity of language models.
- North America > United States > California (0.14)
- North America > Dominican Republic (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (8 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
Do LLMs Know to Respect Copyright Notice?
Xu, Jialiang, Li, Shenglan, Xu, Zhaozhuo, Zhang, Denghui
Prior study shows that LLMs sometimes generate content that violates copyright. In this paper, we study another important yet underexplored problem, i.e., will LLMs respect copyright information in user input, and behave accordingly? The research problem is critical, as a negative answer would imply that LLMs will become the primary facilitator and accelerator of copyright infringement behavior. We conducted a series of experiments using a diverse set of language models, user prompts, and copyrighted materials, including books, news articles, API documentation, and movie scripts. Our study offers a conservative evaluation of the extent to which language models may infringe upon copyrights when processing user input containing protected material. This research emphasizes the need for further investigation and the importance of ensuring LLMs respect copyright regulations when handling user input to prevent unauthorized use or reproduction of protected content. We also release a benchmark dataset serving as a test bed for evaluating infringement behaviors by LLMs and stress the need for future alignment.
- North America > United States > Missouri (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Minnesota (0.04)
- (8 more...)
- Media > Film (1.00)
- Law > Intellectual Property & Technology Law (1.00)
SHARE: Shared Memory-Aware Open-Domain Long-Term Dialogue Dataset Constructed from Movie Script
Kim, Eunwon, Park, Chanho, Chang, Buru
Shared memories between two individuals strengthen their bond and are crucial for facilitating their ongoing conversations. This study aims to make long-term dialogue more engaging by leveraging these shared memories. To this end, we introduce a new long-term dialogue dataset named SHARE, constructed from movie scripts, which are a rich source of shared memories among various relationships. Our dialogue dataset contains the summaries of persona information and events of two individuals, as explicitly revealed in their conversation, along with implicitly extractable shared memories. We also introduce EPISODE, a long-term dialogue framework based on SHARE that utilizes shared experiences between individuals. Through experiments using SHARE, we demonstrate that shared memories between two individuals make long-term dialogues more engaging and sustainable, and that EPISODE effectively manages shared memories during dialogue. Our new dataset is publicly available at https://anonymous.4open.science/r/SHARE-AA1E/SHARE.json.
- North America > United States > California (0.04)
- Asia (0.04)
- Leisure & Entertainment (1.00)
- Media > Television (0.46)
- Media > Film (0.46)
- Information Technology > Decision Support Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
DiscoGraMS: Enhancing Movie Screen-Play Summarization using Movie Character-Aware Discourse Graph
Chitale, Maitreya Prafulla, Bindal, Uday, Rajkumar, Rajakrishnan, Mishra, Rahul
Summarizing movie screenplays presents a unique set of challenges compared to standard document summarization. Screenplays are not only lengthy, but also feature a complex interplay of characters, dialogues, and scenes, with numerous direct and subtle relationships and contextual nuances that are difficult for machine learning models to accurately capture and comprehend. Recent attempts at screenplay summarization focus on fine-tuning transformer-based pre-trained models, but these models often fall short in capturing long-term dependencies and latent relationships, and frequently encounter the "lost in the middle" issue. To address these challenges, we introduce DiscoGraMS, a novel resource that represents movie scripts as a movie character-aware discourse graph (CaD Graph). This approach is well-suited for various downstream tasks, such as summarization, question-answering, and salience detection. The model aims to preserve all salient information, offering a more comprehensive and faithful representation of the screenplay's content. We further explore a baseline method that combines the CaD Graph with the corresponding movie script through a late fusion of graph and text modalities, and we present very initial promising results.
- Asia > Thailand > Bangkok > Bangkok (0.05)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (5 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
Select and Summarize: Scene Saliency for Movie Script Summarization
Abstractive summarization for long-form narrative texts such as movie scripts is challenging due to the computational and memory constraints of current language models. A movie script typically comprises a large number of scenes; however, only a fraction of these scenes are salient, i.e., important for understanding the overall narrative. The salience of a scene can be operationalized by considering it as salient if it is mentioned in the summary. Automatically identifying salient scenes is difficult due to the lack of suitable datasets. In this work, we introduce a scene saliency dataset that consists of human-annotated salient scenes for 100 movies. We propose a two-stage abstractive summarization approach which first identifies the salient scenes in script and then generates a summary using only those scenes. Using QA-based evaluation, we show that our model outperforms previous state-of-the-art summarization methods and reflects the information content of a movie more accurately than a model that takes the whole movie script as input.
- Africa (0.14)
- Asia > South Korea > Busan > Busan (0.05)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (19 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Law (1.00)
- (2 more...)
ChatGPT Tutorial: How To Use ChatGPT by OpenAI
ChatGPT has taken the internet by storm. People have been using it to compose music, understand complex topics, make jokes, write movie scripts, and even debug computer codes. Such is the bot's popularity; it took only five days to score its first million users. This detailed tutorial explains precisely how to use ChatGPT. But before we delve into the details, let's first consider what ChatGPT is and what's causing the huge buzz surrounding the latest AI tool.
- Leisure & Entertainment (0.89)
- Media > Music (0.56)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)
CREATIVESUMM: Shared Task on Automatic Summarization for Creative Writing
Agarwal, Divyansh, Fabbri, Alexander R., Han, Simeng, Kryściński, Wojciech, Ladhak, Faisal, Li, Bryan, McKeown, Kathleen, Radev, Dragomir, Zhang, Tianyi, Wiseman, Sam
This paper introduces the shared task of summarizing documents in several creative domains, namely literary texts, movie scripts, and television scripts. Summarizing these creative documents requires making complex literary interpretations, as well as understanding non-trivial temporal dependencies in texts containing varied styles of plot development and narrative structure. This poses unique challenges and is yet underexplored for text summarization systems. In this shared task, we introduce four sub-tasks and their corresponding datasets, focusing on summarizing books, movie scripts, primetime television scripts, and daytime soap opera scripts. We detail the process of curating these datasets for the task, as well as the metrics used for the evaluation of the submissions. As part of the CREATIVESUMM workshop at COLING 2022, the shared task attracted 18 submissions in total. We discuss the submissions and the baselines for each sub-task in this paper, along with directions for facilitating future work in the field.
- North America > United States > Pennsylvania (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)
- (6 more...)
- Leisure & Entertainment (0.69)
- Media > Television (0.36)
Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind
Yu, Mo, Sang, Yisi, Pu, Kangsheng, Wei, Zekai, Wang, Han, Li, Jing, Yu, Yue, Zhou, Jie
When reading a story, humans can rapidly understand new fictional characters with a few observations, mainly by drawing analogy to fictional and real people they met before in their lives. This reflects the few-shot and meta-learning essence of humans' inference of characters' mental states, i.e., humans' theory-of-mind (ToM), which is largely ignored in existing research. We fill this gap with a novel NLP benchmark, TOM-IN-AMC, the first assessment of models' ability of meta-learning of ToM in a realistic narrative understanding scenario. Our benchmark consists of $\sim$1,000 parsed movie scripts for this purpose, each corresponding to a few-shot character understanding task; and requires models to mimic humans' ability of fast digesting characters with a few starting scenes in a new movie. Our human study verified that humans can solve our problem by inferring characters' mental states based on their previously seen movies; while the state-of-the-art metric-learning and meta-learning approaches adapted to our task lags 30% behind.
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
MovieCLIP: Visual Scene Recognition in Movies
Bose, Digbalay, Hebbar, Rajat, Somandepalli, Krishna, Zhang, Haoyang, Cui, Yin, Cole-McLaughlin, Kree, Wang, Huisheng, Narayanan, Shrikanth
Longform media such as movies have complex narrative structures, with events spanning a rich variety of ambient visual scenes. Domain specific challenges associated with visual scenes in movies include transitions, person coverage, and a wide array of real-life and fictional scenarios. Existing visual scene datasets in movies have limited taxonomies and don't consider the visual scene transition within movie clips. In this work, we address the problem of visual scene recognition in movies by first automatically curating a new and extensive movie-centric taxonomy of 179 scene labels derived from movie scripts and auxiliary web-based video datasets. Instead of manual annotations which can be expensive, we use CLIP to weakly label 1.12 million shots from 32K movie clips based on our proposed taxonomy. We provide baseline visual models trained on the weakly labeled dataset called MovieCLIP and evaluate them on an independent dataset verified by human raters. We show that leveraging features from models pretrained on MovieCLIP benefits downstream tasks such as multi-label scene and genre classification of web videos and movie trailers.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New York (0.04)
- (2 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)